Increasing the Effective Memory Bandwidth in Multivector Processors

نویسندگان

  • Anna M. del Corral
  • José María Llabería
چکیده

In the memory system of multivector processors, the interferences between concurrent vector streams cause the loss of cycles that makes the effective throughput be lower than the required throughput. Then, the work of the functional units is delayed. Using the classical order to access the vector stream elements, the vector stream references the memory modules using a temporal distribution that depends on the element specification pattern; in general, different specification patterns determine different temporal distributions of memory modules. Concurrent vector streams using different temporal distributions to access the memory modules could imply the presence of memory module conflicts even if the request rate of all the concurrent streams to every memory module is less than or equal to it service rate. This paper proposes an access order to reference the vector stream elements. This new order is imposed by a temporal distribution of memory modules that reduces the average memory access time in vector processors. This temporal distribution only depends on the memory system configuration and on the memory modules referenced by the vector stream. When the request rate of all the vector streams to every memory module is less than or equal to its service rate, the proposed order obtains the required bandwidth. Under other conditions, the proposed order reduces the number of lost cycles, and the effective throughput increases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A network flow approach to memory bandwidth utilization in embedded DSP core processors

This paper presents a network flow approach to solving the register binding and allocation problem for multiword memory access DSP processors. In recently announced DSP processors, sixteen bit instructions which simultaneously access four words from memory are supported. A polynomial-time network flow methodology is used to allocate multiword accesses, including constant data memory layout, whi...

متن کامل

Impact of System and Cache Bandwidth on Stencil Computations Across Multiple Processor Generations

We compare old single-core multi-processor systems against multi-core processors and study the question which improvements are most relevant for increasing the performance on stencil computations. Even before the multi-core era began, the bandwidth wall, the discrepancy between off-chip bandwidth requirements and system bandwidth performance, was already a significant problem. Because of the cu...

متن کامل

Preliminary Experiments on Similar Executions with Reduced Off-Chip Accesses in Multi-core Processors

With the increasing number of cores in Multicore processors, limitations in memory bandwidth are a significant issue. We find that there exists a high degree of similarity across multiple executions of the same application with minor variations in input parameter values or input data sets. In this work, we examine two applications where individual executions on different cores work with differe...

متن کامل

Very High Speed Vectorial Processors Using Serial Multiport Memory as Data Memory

Complex scientific problems involving large volumes of data need a huge computing power. Memory bandwidth remains a key issue in high performance systems. This paper presents an original method of memory organization based on serial multiport memory components which allows simultaneous access to all ports without causing either conflict between them or suspension. The resulting process for info...

متن کامل

When to use 3D Die-Stacked Memory for Bandwidth-Constrained Big Data Workloads

Response time requirements for big data processing systems are shrinking. To meet this strict response time requirement, many big data systems store all or most of their data in main memory to reduce the access latency. Main memory capacities have grown, and systems with 2 TB of main memory capacity available today. However, the rate at which processors can access this data—the memory bandwidth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996